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The use of support vector regression (SVR) for regression tasks has been on 
increase over the past few years. Unfortunately, the practical application of 
SVR for regression task is limited due to its dependence on proper setting of 
its hyper-parameters and associated kernel parameter. Therefore, it become 
imperative to device a reliable and fast mechanism of determining the value 


of these parameters that could guarantee lowest generalization error. This 
paper presents SVR parameter optimization approaches using African 
buffalo optimisation (ABO) algorithm, i.e. SVR-ABO. The SVR parameters 
are optimized by using African buffalo optimisation algorithm. Results 
obtained from several experiments performed has shown that the proposed 
ABO algorithm has the capability of determining SVR hyper-parameters 
which most of time has to be done through estimation. 
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1. INTRODUCTION 

Support vector machines (SVM) [1]-[7] as a statistical learning theory based machine learning 
technique introduced [6], [8] has gained popularity in both classification and regression tasks due to its 
various associated prominent abilities that led to a promising generalization performance [6], [9]. Some of 
the most distinguished features of SVR include its ability to model non-linear relationship among data 
samples, non-dependence on input space dimensionality for its generalization and always resulting to global 
solution due to its quadratic formulation approach. Unlike ANN, SVR overcome the overfitting problem 
through adoption of structural risk minimization (SRM) approach that focus on minimizing the generalization 
error instead of minimizing training errors [10]. 

However, despite the mentioned advantages of SVR, the performance of the technique depends on 
appropriate settings three free parameters namely regularization parameter C, tube size € and kernel 
parameter y [11], [12]. Presently, there is no definitive method of selecting these free parameters that can be 
found in literature. Different approaches were proposed, however most of these approaches are dependent on 
user’s domain knowledge and expertise or through trial and error while some are even contradictory [13]. 
Hence, there is lack of certainty about optimality of parameters’ values obtained. On the other hand, the 
problem associated with SVR parameters selection become exacerbated due to the fact that these three 
parameters have to be simultaneously determined for SVR to achieve good generalization ability. Based on 
mentioned reasons, many researchers opt for grid search or cross-validation (CV) methods as an alternative 
method for determining optimal SVR parameters. However, grid search and CV methods have been reported 
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to be computationally expensive, time-consuming and mostly reported high error rate [14]-[16]. Hence, grid 
search and CV methods are considered not suitable choice for determining SVR parameters. 

This led researchers to explore swarm intelligence (SI) methods to optimize optimal parameters of 
SVR algorithm as can be established in literature [2], [3], [5], [17]-[21]. SI algorithms like PSO has been 
widely used for parameter optimization for several algorithms including SVR. The results have shown to be 
providing promising results. Recently, a new SI-based optimisation algorithm namely African buffalo 
optimisation (ABO) has been introduced into research community by Odili and Noraziah [22]. ABO 
algorithm has been used with many algorithms as an optimizing agent in various areas such as travelling 
salesman problem (TSP) [23], selection of biodiversity conservation area with a given constraint [24] and 
parameter tuning of PID controller [25]. In all the mentioned cases, ABO reported success against compared 
similar algorithms due to its fast convergence, ability to track the best position, and speed of each buffalo as 
well as the movement of best buffalo towards better exploration [26]. However, the performance of ABO in 
optimizing three mutual parameters simultaneously as in the case of SVR has not been recorded in the 
literature. Hence, this study proposed to hybridize SVR with ABO as an optimization algorithm to determine 
optimal parameters of SVR. 

The main objective of this paper is to use ABO algorithm to auto-determine optimal values of SVR 
using ABO algorithm. The developed hybrid algorithm reported higher forecasting accuracy as compared 
with standard method of opting for the default SVR parameters. The remaining sections of this paper 
presented various processes involved towards the development of the SVR-ABO algorithm. 


2. THEORETICAL CONCEPT 
2.1. Support vector regression (SVR) 

This section present the theory behind SVR equations as given by [27], [28]. Given a regression 
problem as (1). 


D = {(p1, 41) (D2 92). +» Pns Ind} (1) 


Where D represent the dataset, p € PCR” are the training inputs and q E QCR are the training outputs. The 
main objective is to determine a given function that can approximate the existing relationship between the 
inputs variable(s) represented as p and the associated target variable represented as g. The function later can 
be used to infer new value of target variable q in the future, given new input data value(s). For any regression 
function f(p), there is a loss function L that determine the amount of deviation by the function’s output as 
from the actual value. In this paper we adopted the commonly use loss function that was proposed by Gunn 
[29] and formulated as (2). 


. _ {9 iflp- g| < e 
L(p; g(4)) = le — g(q)| — £ otherwise (2) 
Assuming the linear function g above is represented as (3): 
fq) =w.p +b (3) 


where p is an element of vector P in input space P, where w is the weight vector, b is the bias and w.p 
represent a dot product operation of vector w and p. The primary objective of the formulated regression 
function is to fit the data points with a function that is flat. In the case of (3), the flatness can be achieved by 
making the value of w to be as small as possible. One way to achieve such flatness is by minimizing the norm 
i.e. w°. By so doing, the regression problem can be formulated as an optimization problem in convex form as 
follows: 


minimise : w? (4) 
; qi — (wO(p:) +b) < € 
subject to: { (5) 
i qi — (WTO(p;) +b) = € 
Hence, the (4) can be represented as (6), (7). 
D rants 1 2 * 
minimize = loll + CYNE, + €) (6) 
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The value of the constant C is used to determine the amount of flatness that can be traded-off as the 
result of allowable deviations that are larger than the tolerable value of £. This trade-off value corresponds to 
the value of €-insensitive loss function that was described before which is represented in (6) by (E + E) 
which are also called empirical terms. The empirical terms formed the s-tube which is the acceptable 
maximum deviation. All data points that fall within the range of allowable deviation have zero valued 
coefficient, hence have non-significant contribution value. 

Upon application of Krush-Kuhn_Tucker as a means for finding Lagrangian solution, all products of 
the dual variables vanished at solution point, thus resulting (2) into (8). 


f=) _ (ai a)Kpup;) +b ® 


where pj, p; correspond to the support vectors. 
Various kernel functions can be found in literature, however, radial basis function has been 
established to be most appropriate for regression task [4], [27] which is represented as (9): 


2 
K(x;,x;) = o Vee ll ) (9) 
where y represents a positive non-zero kernel parameter. Hence, (5) can be reduced to (10). 


f) = Eklat — a7 )K(x,x;) +b (10) 


The performance of SVR algorithm is dependent on appropriate settings of its hyper-parameters viz 
regularization constant C, error-sensitivity parameter ¢, and kernel parameter y [16]. The regularization 
parameter determines the tolerable bound of deviations that are larger than defined value of error sensitivity 
parameter and the degree of model complexity. Inappropriate selection of C could lead to an imbalance 
between model complexity minimization (MCM) and empirical risk minimization (ERM). The error- 
sensitivity parameter (e) defines allowable &-insensitive zone based on number of support vectors (SV). 
Large error-sensitivity value results into accommodating significant number of data points into insensitive 
zone, hence undesirable regression estimates due to fewer support vector points. The kernel parameter (y) is 
used to determine the degree of wideness allowable by RBF kernel. Large value of kernel parameter results 
into a non-flexible function, unsuitable for complex function approximation. While smaller value of kernel 
parameter produces over-flexible function that could results into overfitting. 


2.2. African buffalo optimization algorithm 

African buffalo optimisation (ABO) algorithm as swarm-intelligence based algorithm that is 
characterized by a high speed convergence feature developed by Odili et al. [30]. The algorithm was developed 
based on the foraging and herd defending behaviour exhibited by the wild African buffaloes [30], [31]. These 
wild animal exhibit exceptional organizing behaviour of which ABO optimisation algorithm was model 
based upon [23], [32]. In the ABO algorithm as presented in (11) and (12), the wi, represents the “waaa’ 
sound, m; represents the “maaa” sound, while l; and l2 represents its two learning parameters. Other distinct 
parameters of the algorithm are global best (bgmax), the personal best (bpmaxck)) positions. The basic ABO 
algorithm is controlled by democratic and location update equations represented by (11) and (12) 
respectively. The algorithm operates by deducting the “waaa” value (wx) from both global best (bgmax) and 
the personal best (bpmaxca)) which are enhanced by learning parameters in order to direct the herd to either 
explore the search space for better pasture or to retain their present position and continue grazing. The 
detailed of the ABO algorithm operation stages is depicted in (11) and (12) respectively [30]. 
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Mega = Mg + li (BGmax — We) + lo(bPmax (k) T Wx) (11) 
wen = H (12) 
3. METHOD 


The methodology involved in developing the proposed hybrid technique composed of different 
stages ranging from procedure involved, type of dataset used to build and test the model. The proposed 
hybrid model was developed using python programming language with numpy and pandas libraries. Scikit- 
learn version of LibSVM implementation was used as to build the SVR model. The detailed methodology is 
as describe in the following sub-sections. 


3.1. Dataset description 

Four (4) multivariate historical datasets were used to test the accuracy of the developed hybrid 
algorithm. The datasets were obtained using Quandl API. In order to rigorously test the developed model, we 
used four datasets of varying properties that include different number of parameters and number of instances. 
Google and Amazon dataset have twelve (12) parameters each while Tesla and Yahoo have six (6) 
parameters each. In terms of size, we opt to use the Tesla dataset which has only 251 instances in order to test 
the performance of the hybrid model on small dataset. Description of the datasets is as presented in Table 1. 


Table 1. Properties of the datasets 


SNo Dataset Name Duration No of Instances No. of Variables 
1 Google Stock —21/03/2018-19/08/2004 3,424 12 
2 Tesla Stock 18/03/2019-23/03/2018 251 6 
3. Yahoo stock 11/12/2019-03/01/2012 2,003 6 
4 Amazon Inc 16/05/1997-21/03/2018 5,248 12 


The Tesla Inc dataset consists of two hundred and fifty-one (251) instances of daily recordings. The 
Yahoo Inc dataset consists of two thousand and three (2003) instances of daily recordings. The Amazon Inc 
dataset consists of Five thousand two hundred and forty-eight (5,248) number of instances while the Google 
Inc consists of Three four hundred and twenty-four (3,424). 

Each of the datasets was portioned into three (3) parts viz; 70% for training, 15% for validation 
during training and remaining 15% for the actual testing the model. After the partitioning, each of the dataset 
was scaled to values between {1 and -1} in order to remove the differences in magnitude of features to avoid 
influence of features with higher magnitude. The training and validation datasets were scaled separately from 
testing dataset in order to avoid data leakage between the model training and testing phases. 


3.2. Evaluation metrics 

In order to evaluate the performance of the developed hybrid model. We use mean absolute 
percentage (MAPE) and root mean squared error (RMSE) as the two regression evaluation metrics whose 
sole aim is to test the level of accuracy of all developed models. The aim of these metrics is to obtain a small 
value; the smaller the MAPE or RMSE, the better the performance of the developed model is. These 
performance metrices are mathematically represented as in (15) and (16) respectively. 


1 m—-Yn) 
MAPE = ral N] = l * 100% (13) 


Where N represents number of observations, Yp and y, represent the nm observed and forecast values 
respectively. 


N .—f.)2 
RMSE = Zea? (14) 


3.3. The Hybrid SVR-ABO algorithm 

As already pointed out in previous section about the difficulty in determining the three (3) 
parameters of SVR, in our proposed hybrid algorithm we hybridized the SVR algorithm with ABO 
algorithm. The ABO is used iteratively to find optimal values that correspond to the SVR algorithm’s 


Determination of support vector regression parameters using african buffalo ... (Inusa Sani Maijama’a) 


1092 O ISSN: 2502-4752 


parameters. The process started by initializing the buffalo population in search space using Mersenne Twister 
algorithm for better randomization. The fitness of each buffalo was determined based on its initial position in 
the search space. The buffalo that returns the least prediction error based RMSE error metric, is considered as 
the best for that iteration. The overall best buffalo among the best buffaloes produce at each iteration is 
considered as the global best. The training process was conducted based on population size of Two hundred 
(200) and maximum number of iterations of one thousand (1,000). 

Upon reaching the termination criteria, the values of the position of the overall best buffalo are 
considered the required optimal hyper-parameters for the SVR model. Then, the SVR with optimized 
parameters was used to forecast the test dataset. Algorithm 1 illustrate the SVR-ABO forecasting process. 


Algorithm 1: SVR-ABO Algorithm 


Input: Training Data, P D, Max I, 11, 12 
/* P= Number of individual Buffaloes (population size), D= Problem dimension (SVR 
control parameters), Max I = Maximum number of Iterations, 11= Cognitive Learning 
parameter, 12= Social Learning parameter */ 
Output: Optimal values for SVR (C, y and €) as Global best buffalo position 

Ts For buffalo i = 0 to P do: 


a Random initialisation of buffalo position vector m, with three (3) values 
based on [C, y and €] ranges 
3: Random initialisation of each buffalo movement vector Wk 
4: End For 
5: Initialise t = 1 
6: While (t # Max I) do: 
T3 For each buffalo i do: 
8: Calculate fitness_value using SVR regressor 
9: If buffalo’s fitness_value is better than bpmax xy 
0: Set bPma) = buffalo’s current fitness 
ir End If 
2% End For 
3: Set bgmax = Best previous buffalo’s fitness value 
4: /* Updating each buffalo’s movement and position */ 
5i For buffalo i = 0 to P do: 
6: For dimension d = 0 to D do: 
7: mig? = mia +l (bgmax — Wia ) + (bpi — wia) 
8: End For 
9: End For 
20: Set t=t +1 
213 End While 
22% Evaluate the solution on testing set 
23:3 Result: The forecasting values and performance measurement on the testing set 


4. RESULT AND DISCUSSION 

The model developed based on the hybrid algorithm was tested using four (4) different stock market 
historical datasets. The test datasets as described in methodology section were used to test the performance of 
the developed model based on RMSE and MAPE as two statistical evaluation metrics. The result obtained 
from the developed model was compared with results obtained from classical SVR, PSO and GA algorithms 
all on default parameters settings. The findings obtained has shown that ABO algorithm can find optimal 
parameters for SVR algorithm better than the remaining benchmarked algorithms. 

The following analysis, as presented in Table 2 shows the comparison performance of the proposed 
SVR-ABO algorithm against classical SVR, SVR-PSO and SVR-GA algorithms based on MAPE metric. The 
performance of the SVR-ABO algorithm on Amazon dataset shows that it is able to achieves 96.67% 
accuracy based MAPE, while classical SVR, SVR-PSO and SVR-GA recorded 96.09%, 96.48%, and 96.45% 
respectively. These algorithms recorded RMSE values of 36.4381 for classical SVR, 34.5642 for SVR-GA, 
33.6503 for SVR-ABO, and 33,9624 for SVR-PSO. 

On Google dataset, the algorithms were able to record RMSE values of 35.3514, 48.9568, 33.2592 
and 33.5197 for classical SVR, SVR-GA, SVR-ABO and SVR-PSO respectively. While on MAPE metric, 
the algorithms performance shows that the SVR-ABO algorithm recorded accuracy of 96.65%, while the 
classical SVR, SVR-PSO, and SVR-GA algorithms were able to record accuracy of 96.28%, 96.50%, and 
95.45% respectively. 

The performance of the algorithms on Tesla test data has shown that SVR-ABO is able to achieve 
higher performance with accuracy of 96.83%, while SVR-PSO, SVR-GA and classical SVR were able to 
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achieve accuracy of 96.40%, 96.82%, and 95.81%. These algorithms recorded RMSE values of 35.7663 for 
classical SVR, 34.7329 for SVR-GA, 34.7005 for SVR-ABO, and 35.0092 for SVR-PSO. 

Lastly, on Yahoo test data, the algorithms recorded RMSE values of 35.9359, 33.9286, 32.2832 and 
32.9706 for classical SVR, SVR-GA, SVR-ABO and SVR-PSO respectively. While on MAPE metric, the 
algorithms performance shows that the SVR-ABO algorithm recorded accuracy of 96.21%, while the 
classical SVR, SVR-PSO, and SVR-GA algorithms were able to record accuracy of 95.75%, 96.04%, and 


96.02% respectively. 


Table 2. Evaluation of SVR and SVR-ABO over four (4) datasets based on RMSE and MAPE 


RMSE MAPE 
SVR SVR-GA SVR-ABO SVR-PSO SVR = SVR-GA  SVR-ABO  SVR-PSO 
Amazon 36.4381 34.5642 33.6503 33.9624 3.9074 3.5491 3.3350 3.5164 
Datasets Google 35.3514 48.9568 33.2592 33.5197 3.7185 4.5537 3.4799 3.5019 
Tesla 35.7663 34.7329 34.7005 35.0092 4.1859 3.17601 3.1741 3.6050 
Yahoo 35.9359 33.9286 32.2832 32.9706 4.2337 3.9786 3.7912 3.9628 


Figures 1-4 shows the visual performance of the forecasting result obtained by the proposed 
algorithm against other benchmarked algorithms on the testing data reserved for evaluation purpose. Figure 1 
shows the predicted values against the actual values on Amazon dataset for a period of one month. Figure 2 
shows the comparison performance of the algorithms on Google test. Figure 3 shows the visual performance 
of the algorithms on Tesla test data over a period of last three (3) months. Lastly, Figure 4 shows comparison 


performance between the developed hybrid algorithm and other algorithms on Yahoo test data. 
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Figure 2. SVR-ABO performance on Google dataset 
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5. CONCLUSION AND FUTURE WORK 

In this paper, a new method of determining the optimal values of SVR algorithm based on African 
buffalo optimisation (ABO) algorithm has been proposed. The proposed method has shown that it can 
achieve higher accuracy in terms of generalization than classical SVR with default settings of parameters. 
The proposed method has been tested using stock price data of some selected companies. The time series 
datasets were chosen due to their inherent nature of non-linearity and the dependence of the closing stock 
price of each day on several variables (multivariate). The key feature of the proposed approach is its ability 
of exploiting the power of swarm intelligence method to auto-determine optimal parameters values of 
classical SVR algorithm. Hence, the method has the capability of increasing generalization ability of SVR 
thereby increases model accuracy. The future work will explore the performance of the developed algorithm 
on other dataset in different domains. 
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